#### **PipeProof (including hands-on):**

#### Verifying simpleSC across all programs



# Does hardware correctly implement ISA MCM?

Microarchitecture



Litmus Test

| Core 0                       | Core 1                      |  |
|------------------------------|-----------------------------|--|
| (i1) St [x] $\leftarrow 1$   | (i3) Ld r1 $\leftarrow$ [y] |  |
| (i2) St [y] $\leftarrow 1$   | (i4) Ld r2 $\leftarrow$ [x] |  |
| Under TSO: Forbid r1=1, r2=0 |                             |  |

SC/TSO/RISC-V MCM? (for the litmus test)

## Does hardware correctly implement ISA MCM?

Microarchitecture







## PipeCheck vs PipeProof

PipeCheck:



PipeProof:





## Why do we need PipeProof?

- Test-based verification only checks that tested programs run correctly!
- Open question: Does a suite of litmus tests cover all µarch bugs?
- Example: Remove EnforceWritePPO axiom from simpleSC
  - /home/check/pipecheck\_tutorial/uarches/SC\_fillable.uarch
  - Some orderings between same-core stores and loads removed, violating SC
  - Will bug be detected? **Depends what tests you run!**

```
Axiom "EnforceWritePPO":
```

forall microop "w",
forall microop "i",
(IsAnyWrite w /\ SameCore w i
 /\ EdgeExists((w, Fetch), (i, Fetch), "")) =>
 AddEdge ((w, Writeback), (i, Execute)).



### SimpleSC without EnforceWritePPO

| mp Litmus Test         |                    |
|------------------------|--------------------|
| Core 0                 | Core 1             |
| x = 1;                 | r1 = y;<br>r2 = x; |
| y = 1;                 | $r_{Z} = x;$       |
| Forbid: r1 = 1, r2 = 0 |                    |



sb Litmus Test

| Core 0                 | Core 1            |  |
|------------------------|-------------------|--|
| x = 1;<br>r1 = y;      | y = 1;<br>r2 = x; |  |
| Forbid: r1 = 0, r2 = 0 |                   |  |



5

### SimpleSC without EnforceWritePPO

| mp Litmus Test         |         |  |
|------------------------|---------|--|
| Core 0                 | Core 1  |  |
| x = 1;                 | r1 = y; |  |
| y = 1;                 | r2 = x; |  |
| Forbid: r1 = 1, r2 = 0 |         |  |



sb Litmus Test

| Core 0                      | Core 1            |
|-----------------------------|-------------------|
| x = 1;<br>r1 = y;           | y = 1;<br>r2 = x; |
| Forbid: $r1 = 0$ , $r2 = 0$ |                   |



## SimpleSC without EnforceWritePPO

| mp Litı | mus Test | sb Litn | nus Test |
|---------|----------|---------|----------|
| Core 0  | Core 1   | Core 0  | Core 1   |
| x = 1;  | r1 = y;  | x = 1;  | y = 1;   |
| y = 1;  | r2 = x;  | r1 = y; | r2 = x;  |

#### **Different tests catch different bugs!**

#### To catch all bugs, must verify across all programs!



# Verifying Across All Possible Programs

- Are all forbidden programs microarchitecturally unobservable?
  - If so, then microarchitecture is correct
- Infinite number of forbidden programs
  - E.g.: For SC, must check all possibilities of  $cyclic(po \cup co \cup rf \cup fr)$
- How are these ISA-level patterns related to litmus tests?



- Each forbidden litmus test is an instance of an ISA-level cycle
- PipeProof verifies the ISA-level cycles rather than litmus tests
  - Instructions in the ISA-level cycle are **symbolic** (no concrete addresses/values)
  - Verification of ISA-level cycle checks it for all possible addresses/values!

| mp                                  |                   |  |
|-------------------------------------|-------------------|--|
| Thread 0                            | Thread 1          |  |
| i1: Store [x] ← 1                   | i3: r1 = Load [y] |  |
| i2: Store [y] ← 1 i4: r2 = Load [x] |                   |  |
| SC Forbids: r1=1, r2=0              |                   |  |



- Each forbidden litmus test is an instance of an ISA-level cycle
- PipeProof verifies the ISA-level cycles rather than litmus tests
  - Instructions in the ISA-level cycle are **symbolic** (no concrete addresses/values)
  - Verification of ISA-level cycle checks it for all possible addresses/values!



- Each forbidden litmus test is an instance of an ISA-level cycle
- PipeProof verifies the ISA-level cycles rather than litmus tests
  - Instructions in the ISA-level cycle are **symbolic** (no concrete addresses/values)
  - Verification of ISA-level cycle checks it for all possible addresses/values!



- Each forbidden litmus test is an instance of an ISA-level cycle
- PipeProof verifies the ISA-level cycles rather than litmus tests
  - Instructions in the ISA-level cycle are **symbolic** (no concrete addresses/values)
  - Verification of ISA-level cycle checks it for all possible addresses/values!



## PipeProof: What's Needed

- 1. Link ISA-level MCM to microarchitectural specification
  - ISA Edge Mapping
- 2. Add universal constraints that symbolic analysis must respect
  - Theory Lemmas
- 3. A finite representation of all forbidden ISA-level cycles
  - Transitive Chain (TC) Abstraction
- 4. Automated refinement checking of the finite representation
  - Microarchitectural Correctness Proof
  - Chain invariants (for termination)



- Open /home/check/pipeproof\_tutorial/uarches/simpleSC\_fill.uarch
- Translate each edge in ISA-level cycle to microarchitectural constraints
- Do so with user-provided Mapping Axioms
- Example: Mapping of po edges



Axiom "Mapping\_po":
forall microop "i",
forall microop "j",
(HasDependency po i j =>
 AddEdge ((i, Fetch), (j, Fetch), "po\_arch", "blue")).



- Open /home/check/pipeproof\_tutorial/uarches/simpleSC\_fill.uarch
- Translate each edge in ISA-level cycle to microarchitectural constraints
- Do so with user-provided Mapping Axioms
- Example: Mapping of po edges



Axiom "Mapping\_po": Check whether a po edge
forall microop "i", from i to j exists
forall microop "j",
(HasDependency po i j =>
AddEdge ((i, Fetch), (j, Fetch), "po\_arch", "blue")).

- Open /home/check/pipeproof\_tutorial/uarches/simpleSC\_fill.uarch
- Translate each edge in ISA-level cycle to microarchitectural constraints
- Do so with user-provided Mapping Axioms
- Example: Mapping of po edges







- Open /home/check/pipeproof\_tutorial/uarches/simpleSC\_fill.uarch
- Translate each edge in ISA-level cycle to microarchitectural constraints
- Do so with user-provided Mapping Axioms
- Example: Mapping of po edges

Blue edges between EX and WB stages added by other FIFO axioms (refer to µspec file)





# Mapping Axioms Hands-on

How about mapping co (coherence order) edges?

Hint:

- *po* edge mapping was similar to **PO\_Fetch** axiom
- co edge mapping is based on WriteSerialization axiom





# Mapping Axioms Hands-on

How about mapping co (coherence order) edges?

Hint:

- *po* edge mapping was similar to **PO\_Fetch** axiom
- co edge mapping is based on WriteSerialization axiom



Axiom "Mapping\_co":
forall microop "i",
forall microop "j",
(HasDependency co i j => SamePhysicalAddress i j /\
 AddEdge ((i, Writeback), (j, Writeback), "co\_arch")).



## ISA Edge Mappings for SimpleSC

Refer to simpleSC\_fill.uarch to see mapping axioms for rf, fr





# PipeProof: What's Needed

- 1. Link ISA-level MCM to microarchitectural specification
  - ISA Edge Mapping
- 2. Add universal constraints that symbolic analysis must respect
  - Theory Lemmas
- 3. A finite representation of all forbidden ISA-level cycles
  - Transitive Chain (TC) Abstraction
- 4. Automated refinement checking of the finite representation
  - Microarchitectural Correctness Proof
  - Chain invariants (for termination)



# Symbolic Analysis Requires Theory Lemmas

- Symbolic analysis: predicates are just variables that can be true or false
  - "Theory Lemmas" necessary to enforce "universal" laws on predicates
- **Example:** Is an instruction guaranteed to be a read or write?

i: r1 = Load [x]

**Concrete:** Look at instruction -> IsAnyRead i is true



# Symbolic Analysis Requires Theory Lemmas

- Symbolic analysis: predicates are just variables that can be true or false
  - "Theory Lemmas" necessary to enforce "universal" laws on predicates
- **Example:** Is an instruction guaranteed to be a read or write?



**Concrete:** Look at instruction -> IsAnyRead i is true

**Symbolic:** We now know nothing about the instruction!

Both IsAnyRead i and IsAnyWrite i could be false! (even though this can't happen in reality)



# Symbolic Analysis Requires Theory Lemmas

- Symbolic analysis: predicates are just variables that can be true or false
  - "Theory Lemmas" necessary to enforce "universal" laws on predicates
- Example: Is an instruction guaranteed to be a read or write?



**Concrete:** Look at instruction -> IsAnyRead i is true

**Symbolic:** We now know nothing about the instruction!

Both IsAnyRead i and IsAnyWrite i could be false! (even though this can't happen in reality)

Need Additional Theory Lemma to enforce that op is either a read or write!

```
Axiom "Theory_Lemmas":
forall microop "i",
```

IsAnyRead i ∖/ IsAnyWrite i)

Theory Lemmas: Hands-on  
i: Store 
$$[x] \leftarrow 1$$
   
j: Store  $[x] \leftarrow 2$   
 $k: Store [x] \leftarrow 3$ 

**Concrete:** Directly compare instructions i and k -> **SamePhysicalAddress i k is true** 





**Concrete:** Directly compare instructions i and k -> **SamePhysicalAddress i k is true Symbolic:** co edge mapping gives **SamePhysicalAddress i j** and **SamePhysicalAddress j k** But **SamePhysicalAddress i k could be false!** (even though this can never happen in reality)





**Concrete:** Directly compare instructions i and k -> **SamePhysicalAddress i k is true Symbolic:** co edge mapping gives **SamePhysicalAddress i j** and **SamePhysicalAddress j k** But **SamePhysicalAddress i k could be false!** (even though this can never happen in reality)

**Need Additional Theory Lemma for Transitivity of SamePhysicalAddress!** 

```
Axiom "Theory_Lemmas":
forall microop "i",
...
forall microop "j",
...
forall microop "k",
(SamePhysicalAddress _ _ /\ SamePhysicalAddress _ _ =>
SamePhysicalAddress _ _)...
```



**Concrete:** Directly compare instructions i and k -> **SamePhysicalAddress i k is true Symbolic:** co edge mapping gives **SamePhysicalAddress i j** and **SamePhysicalAddress j k** But **SamePhysicalAddress i k could be false!** (even though this can never happen in reality)

**Need Additional Theory Lemma for Transitivity of SamePhysicalAddress!** 

```
Axiom "Theory_Lemmas":
forall microop "i",
...
forall microop "j",
...
forall microop "k",
(SamePhysicalAddress i j /\ SamePhysicalAddress j k =>
SamePhysicalAddress i k)...
```

# PipeProof: What's Needed

- 1. Link ISA-level MCM to microarchitectural specification
  - ISA Edge Mapping
- 2. Add universal constraints that symbolic analysis must respect
   Theory Lemmas
- 3. A finite representation of all forbidden ISA-level cycles
  - Transitive Chain (TC) Abstraction
- 4. Automated refinement checking of the finite representation
  - Microarchitectural Correctness Proof
  - Chain invariants (for termination)



# Verifying Across All Possible Programs

#### Infinite number of forbidden programs

- E.g.: For SC, must check all possibilities of  $cyclic(po \cup co \cup rf \cup fr)$
- Prove using abstractions and induction
  - Based on Counterexample-guided abstraction refinement [Clarke et al. CAV 2000]

# Verifying Across All Possible Programs

#### Infinite number of forbidden programs

- E.g.: For SC, must check all possibilities of  $cyclic(po \cup co \cup rf \cup fr)$
- Prove using abstractions and induction
  - Based on Counterexample-guided abstraction refinement [Clarke et al. CAV 2000]



All non-unary cycles containing **fr** (Infinite set)







All non-unary cycles containing **fr** (Infinite set)







Cycle = Transitive Chain (sequence) + Loopback edge (fr)



All non-unary cycles containing **fr** (Infinite set)







Cycle = Transitive Chain (sequence) + Loopback edge (fr)

# Transitive chain (sequence) of ISA-level edges





All non-unary cycles containing **fr** (Infinite set)





Cycle = Transitive Chain (sequence) + Loopback edge (fr)

ISA-level **transitive chain =>** Microarch. level **transitive connection** 



# The Transitive Chain (TC) Abstraction









### The Transitive Chain (TC) Abstraction

#### Infinite!





 $\mathbf{i}_1$ 

#### The Transitive Chain (TC) Abstraction

#### Infinite!

Finite!





I<sub>1</sub>

# Transitive Chain (TC) Abstraction Support Proof

- Ensure that ISA-level pattern and µarch. support TC Abstraction
- Base case: Do initial ISA-level edges guarantee connection?



Inductive case: Extend transitive chain => extend transitive connection?



### PipeProof: What's Needed

- 1. Link ISA-level MCM to microarchitectural specification
  - ISA Edge Mapping
- 2. Add universal constraints that symbolic analysis must respect
  - Theory Lemmas
- 3. A finite representation of all forbidden ISA-level cycles
  - Transitive Chain (TC) Abstraction
- 4. Automated refinement checking of the finite representation
  - Microarchitectural Correctness Proof
  - Chain invariants (for termination)









Acyclic graph with transitive connection =>

**Abstract Counterexample** (i.e. possible bug)



Transitive connection (green edge) may

represent one or multiple ISA-level edges



Transitive connection (green edge) may

represent one or multiple ISA-level edges

Observable





### **Refinement Loop: Concretization**

- Replaces transitive connection with a single ISA-level edge
  - All concretizations must be unobservable
  - Observable concretizations are counterexamples (bugs)



### **Refinement Loop: Concretization**

- Replaces transitive connection with a single ISA-level edge
  - All concretizations must be unobservable
  - Observable concretizations are counterexamples (bugs)



### **Refinement Loop: Concretization**

- Replaces transitive connection with a single ISA-level edge
  - All concretizations must be unobservable
  - Observable concretizations are counterexamples (bugs)





- Inductively break down transitive chain
  - Additional constraints may be enough to make execution unobservable





- Inductively break down transitive chain
  - Additional constraints may be enough to make execution unobservable



- Inductively break down transitive chain
  - Additional constraints may be enough to make execution unobservable

| factorial(n)      | = | factorial(n-1)      | * | n                 |
|-------------------|---|---------------------|---|-------------------|
|                   |   |                     |   |                   |
| Chain of length n | = | Chain of length n-1 | + | "Peeled-off" edge |





- Inductively break down transitive chain
  - Additional constraints may be enough to make execution unobservable





- Inductively break down transitive chain
  - Additional constraints may be enough to make execution unobservable



If decomposition is abstract

counterexample, repeat concretization

and decomposition!

#### Hands-on: Let's Run PipeProof!

# Assuming you are in ~/pipeproof\_tutorial/uarches/
\$ prove\_uarch -m simpleSC\_fill.uarch -i SC -n

What happens?



#### Hands-on: Let's Run PipeProof!

PipeProof does not terminate; why?

// Checking Path: (1/1, fr;)
// Checking Path: (1/1, fr;) (1/1, po;fr;)
// Checking Path: (1/1, fr;) (1/1, po;fr;) (1/1, po;po;fr;)
// Checking Path: (1/1, fr;) (1/1, po;fr;) (1/1, po;po;fr;) (1/1,
po;po;po;fr;)

• • •



- Abstractly represent repeated ISA-level patterns
- Sometimes needed for refinement loop to terminate
- Inductively proven by PipeProof before their use in proof algorithms
- Example: checking for edge from i1 to i5 (TC abstraction support proof)





- Abstractly represent repeated ISA-level patterns
- Sometimes needed for refinement loop to terminate
- Inductively proven by PipeProof before their use in proof algorithms
- Example: checking for edge from i1 to i5 (TC abstraction support proof)



- Abstractly represent repeated ISA-level patterns
- Sometimes needed for refinement loop to terminate

**Repeating ISA-Level Pattern** 

- Inductively proven by PipeProof before their use in proof algorithms
- Example: checking for edge from i1 to i5 (TC abstraction support proof)



Can continue decomposing in this way forever!

- Abstractly represent repeated ISA-level patterns
- Sometimes needed for refinement loop to terminate
- Inductively proven by PipeProof before their use in proof algorithms
- Example: checking for edge from i1 to i5 (TC abstraction support proof)



-po\_plus = arbitrary
number of repetitions of po
-Next edge peeled off will
be something other than po

26

### Adding the Chain Invariant for po+

• Uncomment the invariant at the end of simpleSC\_fill.uarch:

```
Axiom "Invariant_poplus":
forall microop "i",
forall microop "j",
HasDependency po_plus i j =>
  (AddEdge ((i, Fetch), (j, Fetch), "") /\ SameCore i j).
```

Now re-run PipeProof:

# Assuming you are in ~/pipeproof\_tutorial/uarches/
\$ prove\_uarch -m simpleSC\_fill.uarch -i SC

Should be proven in about a minute on the VM



#### **PipeProof Block Diagram**





#### PipeProof Does the Difficult Stuff for You!

- Users simply provide axioms, mappings, theory lemmas, and invariants
- PipeProof takes care of:
  - Proving TC Abstraction soundness
  - Proving any chain invariants
  - Refining abstraction (concretization and decomposition)
  - Inductively generating ISA-level cycles and covering all possibilities

#### Architects can use PipeProof; not just for formal methods experts!



### PipeProof: TSO Case Study

#### Provided in VM as solutions/simpleTSO.uarch

- Can try on your own time
- Requires additional ISA-level relations, theory lemmas, and chain invariants
- Will take at least 41 minutes to verify



#### Results

- Ran PipeProof on simpleSC (SC) and simpleTSO (TSO<sup>1</sup>) μarches
  - 3-stage in-order pipelines
- TSO verification made feasible by optimizations
  - Explicitly checking all decompositions => case explosion
  - Covering Sets Optimization (eliminate redundant transitive connections)
  - Memoization (eliminate previously checked ISA-level cycles)

|            | simpleSC  | simpleSC<br>(w/ Covering Sets + Memoization) |
|------------|-----------|----------------------------------------------|
| Total Time | 225.9 sec | 19.1 sec                                     |

|            | simpleTSO | simpleTSO<br>(w/ Covering Sets + Memoization) |
|------------|-----------|-----------------------------------------------|
| Total Time | Timeout   | 2449.7 sec (≈ 41 mins)                        |



#### **PipeProof Takeaways**

- Automated All-Program Microarchitectural MCM Verification
  - Designers no longer need to choose between completeness and automation
  - Can verify microarchitectures themselves, before RTL is written!
- Based on techniques from formal methods (CEGAR) [Clarke et al. CAV 2000]
- Transitive Chain (TC) Abstraction models infinite set of executions
- Open-source: <u>https://github.com/ymanerka/pipeproof</u>
- Accolades:
  - Nominated for Best Paper at MICRO 2018
  - "Hon. Mention" from 2018 IEEE Micro Top Picks of Comp. Arch. Conferences



#### **Backup Slides**



# **Covering Sets Optimization**

- Must verify across all possible transitive connections
- Each decomposition creates a new set of transitive connections
  - Can quickly lead to a case explosion
- The Covering Sets Optimization eliminates redundant transitive connections





# **Covering Sets Optimization**

- Must verify across all possible transitive connections
- Each decomposition creates a new set of transitive connections
  - Can quickly lead to a case explosion
- The Covering Sets Optimization eliminates redundant transitive connections

Graph A has an edge from  $x \rightarrow z$  (tran conn.)





# **Covering Sets Optimization**

- Must verify across all possible transitive connections
- Each decomposition creates a new set of transitive connections
  - Can quickly lead to a case explosion
- The Covering Sets Optimization eliminates redundant transitive connections

Graph A has an edge from x→z (tran conn.)



Graph B has edges from y→z (tran conn.) and x→z (by transitivity)



# **Covering Sets Optimization**

- Must verify across all possible transitive connections
- Each decomposition creates a new set of transitive connections
  - Can quickly lead to a case explosion
- The Covering Sets Optimization eliminates redundant transitive connections

Graph A has an edge from  $x \rightarrow z$  (tran conn.)



Graph B has edges from y→z (tran conn.) and x→z (by transitivity)

Correctness of A => Correctness of B (since B contains A's tran conn.) Checking B explicitly is redundant!



- Base PipeProof algorithm examines some cycles multiple times
- Memoization eliminates redundant checks of cycles that have already been verified





- Base PipeProof algorithm examines some cycles multiple times
- Memoization eliminates redundant checks of cycles that have already been verified







- Base PipeProof algorithm examines some cycles multiple times
- Memoization eliminates redundant checks of cycles that have already been verified







- Base PipeProof algorithm examines some cycles multiple times
- Memoization eliminates redundant checks of cycles that have already been verified

Some

Tran.



i4

EX





Same cycle is checked 3 times!

- Base PipeProof algorithm examines some cycles multiple times
- Memoization eliminates redundant checks of cycles that have already been verified





Same cycle is checked 3 times!

<u>Procedure:</u> If all ISA-level cycles containing edge r<sub>i</sub> have been checked, do not peel off r<sub>i</sub> edges when checking subsequent cycles



## The Adequate Model Over-Approximation

- Addition of an instruction can make unobservable execution observable!
- Need to work with over-approximation of microarchitectural constraints
- PipeProof sets all exists clauses to true as its over-approximation





## Filtering Invalid Decompositions

- When decomposing a transitive connection, the decomposition should guarantee the transitive connections of its parent abstract cexes.
- Decompositions that do not do this are invalid and filtered out

























execution that is observable) is often returned



